Benchmark Comparison Workflow #331

ajay-mk · 2025-07-14T14:26:44Z

This PR introduces the ability to compare performance on CI for PRs.

There are two parts to the workflow:

A python script benchmark_compare.py which can be used to compare performance of two commits using benchmark outputs.
How to use:

python3 benchmark_compare.py abc123 def456
python3 benchmark_compare.py master your/feature/branch

The script is self-contained and can be run locally. All build-related variables (e.g., CMake flags) are defined inside the script.

A benchmark comparison which can be triggered through a comment on PR or manual dispatch. The workflow uses the python script to compare the benchmark outputs of base and head commit, and reports the results as a comment using the github-actions bot. (See example at Expt: Dummy Branch ajay-mk/SeQuant#4 (comment))

Caveats: This isn’t 100% foolproof. GitHub-hosted runners can have noise in benchmark results due to shared hardware. From some testing I did, this happens only rarely. It's not perfect, but it should help us catch major performance regressions if any.
Notes: Ideally, we should run this on a self-hosted runner to get reliable numbers. GitHub does not recommend doing this for public repos because of security concerns.

This is my first time wiring up a somewhat complicated CI workflow—open to suggestions or improvements.

Why: When writing JSON/CSV, the standard locale set by `set_locale` breaks the JSON/CSV structure because of the presence of commas in large numbers.

Presence of commas in numbers break JSON/CSV output

This script can be used to compare peformance of two commits. It checks out, builds and compares benchmark outputs of both commits. How to use: python3 benchmark_compare.py abc123 def456 python3 benchmark_compare.py master your/feature/branch

This is a workflow which can be triggered by commenting "\benchmark" on PRs (or through manual dispatch). It uses the `benchmark_compare.py` to run and compare benchmarks. The comparison result is posted as a comment on the PR.

Krzmbrzl · 2025-07-14T18:26:35Z

Google Benchmarks ships with a dedicated Python script to compare benchmarks. I recommend making use of that rather than rewriting something like that.

Additionally, I think we should run the entire benchmark and not only the CC one.

ajay-mk · 2025-07-14T18:43:48Z

Google Benchmarks ships with a dedicated Python script to compare benchmarks. I recommend making use of that rather than rewriting something like that.

I did see that. It outputs just differences, I wanted to see the percentage differences also. I will check if there is a way to reuse it.

Additionally, I think we should run the entire benchmark and not only the CC one.

Yes, we do run the sequant_benchmarks target in this case. I added the cc target just for convenience.

…ark-workflow

- When trying to benchmark with branch names, presence of '\' can have issues, replace them with '-' - output filename changed to "benchmark_comparison.txt" - Add ability to import the compare_benchmarks function for independent use

ajay-mk · 2025-07-17T16:45:00Z

I have made a couple of changes:

Fixed an issue with benchmark output naming
Output file is now called benchmark-comparison.txt
Also made the the compare_benchmarks function more reusable. I can separate it into another file if needed, then if we have two benchmarks, we can just call python3 compare.py base.json new.json

Copilot

Pull Request Overview

This PR adds CLI and CI support for benchmarking performance regressions, including locale formatting controls in the runtime, a custom CMake target for a specific benchmark filter, and a GitHub Actions workflow to compare benchmark outputs between two commits.

Introduce disable_thousands_separator API and call it in benchmarks/main.cpp to ensure no digit grouping in JSON/CSV outputs.
Add sequant_benchmark_cc custom CMake target for running only the cc_full benchmark.
Create a workflow (benchmark_compare.yml) that triggers on PR comments or manual dispatch to run comparisons via benchmark_compare.py.

Reviewed Changes

Copilot reviewed 5 out of 6 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
benchmarks/main.cpp	Call `disable_thousands_separator()` right after setting the locale
benchmarks/CMakeLists.txt	Add `sequant_benchmark_cc` custom target to run the `cc_full` filter
SeQuant/core/runtime.hpp	Declare `disable_thousands_separator()` in the public API
SeQuant/core/runtime.cpp	Implement `disable_thousands_separator()` via locale facet overrides
.github/workflows/benchmark_compare.yml	Add workflow to compare benchmark outputs and post results as a comment

Comments suppressed due to low confidence (3)

.github/workflows/benchmark_compare.yml:131

The workflow uploads ${base_sha}-${head_sha}-comparison.txt but reads benchmark-comparison.txt. Update the filename to match the actual uploaded artifact path or use the dynamic names when reading.

            const comparisonFile = `benchmark-comparison.txt`;

benchmarks/CMakeLists.txt:25

[nitpick] The custom target name sequant_benchmark_cc is ambiguous—consider aligning naming with sequant_benchmarks_cc or adding a more descriptive suffix to clarify it runs only the cc_full benchmark.

add_custom_target(sequant_benchmark_cc

benchmarks/main.cpp:13

[nitpick] Consider adding a brief inline comment explaining why disabling the thousands separator is needed in this context, e.g., to ensure CSV/JSON parsers receive ungrouped numbers.

  disable_thousands_separator();

SeQuant/core/runtime.cpp

SeQuant/core/runtime.hpp

Krzmbrzl · 2025-07-21T07:41:54Z

I wanted to see the percentage differences also

It should be pretty straight forward to compute these from the absolute differences, no? The benefit of making use of the "built-in" comparison would be that this script (almost certainly) will get updated along with any potential file format changes of the benchmark output.

Krzmbrzl

👀

Krzmbrzl · 2025-07-21T07:39:42Z

bin/admin/benchmark_compare.py

+    if metric not in ["cpu_time", "real_time"]:
+        raise ValueError("Invalid metric specified. Use 'cpu_time' or 'real_time'.")


I would recommend using an enum for the metric. This will make it obvious which values are valid.

ajay-mk added 2 commits July 14, 2025 09:26

benchmark: add an alternate benchmark target to just run cc benchmarks

39a0dc8

benchmark: reduce max CC rank to 15

49a2133

ajay-mk requested a review from Copilot July 14, 2025 14:49

This comment was marked as outdated.

Sign in to view

locale: can optionally disable thousands operator

556ce00

Why: When writing JSON/CSV, the standard locale set by `set_locale` breaks the JSON/CSV structure because of the presence of commas in large numbers.

ajay-mk force-pushed the ajay/feature/benchmark-workflow branch from f96974e to 71ba004 Compare July 14, 2025 14:54

ajay-mk added 4 commits July 14, 2025 10:55

benchmark: disable thousands separator in benchmark output

6c8649c

Presence of commas in numbers break JSON/CSV output

ci: introduce benchmark comparison workflow

7d75d93

This is a workflow which can be triggered by commenting "\benchmark" on PRs (or through manual dispatch). It uses the `benchmark_compare.py` to run and compare benchmarks. The comparison result is posted as a comment on the PR.

ci: avoid duplicate benchmark workflows

bfe6010

ajay-mk force-pushed the ajay/feature/benchmark-workflow branch from 71ba004 to bfe6010 Compare July 14, 2025 14:55

ci: remove some debug info [skip ci]

1bad32d

ajay-mk and others added 3 commits July 16, 2025 14:47

Merge remote-tracking branch 'origin/master' into ajay/feature/benchm…

2761c29

…ark-workflow

Merge branch 'master' into ajay/feature/benchmark-workflow

1df8cfb

ajay-mk requested a review from Copilot July 17, 2025 17:37

Copilot AI reviewed Jul 17, 2025

View reviewed changes

SeQuant/core/runtime.cpp Show resolved Hide resolved

SeQuant/core/runtime.hpp Outdated Show resolved Hide resolved

ajay-mk added 3 commits July 17, 2025 13:42

ci: update output file name in benchmark_compare.yml

510ef15

benchmark: enforce release build, even though it is the default

c88cbb2

fix typo

9667369

ajay-mk force-pushed the ajay/feature/benchmark-workflow branch from 2107abf to 9667369 Compare July 17, 2025 17:43

ajay-mk marked this pull request as ready for review July 17, 2025 18:19

Krzmbrzl reviewed Jul 21, 2025

View reviewed changes

Merge branch 'master' into ajay/feature/benchmark-workflow

ec357be

ajay-mk marked this pull request as draft October 1, 2025 03:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Benchmark Comparison Workflow #331

Benchmark Comparison Workflow #331

Uh oh!

ajay-mk commented Jul 14, 2025 •

edited

Loading

Uh oh!

This comment was marked as outdated.

Uh oh!

Krzmbrzl commented Jul 14, 2025

Uh oh!

ajay-mk commented Jul 14, 2025

Uh oh!

ajay-mk commented Jul 17, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Krzmbrzl commented Jul 21, 2025

Uh oh!

Krzmbrzl left a comment

Uh oh!

Krzmbrzl Jul 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

		if metric not in ["cpu_time", "real_time"]:
		raise ValueError("Invalid metric specified. Use 'cpu_time' or 'real_time'.")

Benchmark Comparison Workflow #331

Are you sure you want to change the base?

Benchmark Comparison Workflow #331

Uh oh!

Conversation

ajay-mk commented Jul 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

This PR introduces the ability to compare performance on CI for PRs.

Uh oh!

This comment was marked as outdated.

Uh oh!

Krzmbrzl commented Jul 14, 2025

Uh oh!

ajay-mk commented Jul 14, 2025

Uh oh!

ajay-mk commented Jul 17, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Krzmbrzl commented Jul 21, 2025

Uh oh!

Krzmbrzl left a comment

Choose a reason for hiding this comment

Uh oh!

Krzmbrzl Jul 21, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

ajay-mk commented Jul 14, 2025 •

edited

Loading